analysis overcoming
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory of standard (non-adversarial) supervised training was developed by various groups for {\em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective. Recently, a first step towards this direction was made by Gao et al. using tools from online learning, but they require the width of the net to be \emph{exponential} in input dimension $d$, and with an unnatural activation function. Our work proves convergence to low robust training loss for \emph{polynomial} width instead of exponential, under natural assumptions and with ReLU activations. A key element of our proof is showing that ReLU networks near initialization can approximate the step function, which may be of independent interest.
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory of standard (non-adversarial) supervised training was developed by various groups for {\em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective.
Review for NeurIPS paper: Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
Weaknesses: Although this paper has an obvious improvement on Gao et al.'s work, I have to say that it lacks novelty, and the contribution is small. Width, runing time and activation funtion are not a huge gap in Gao et al.'s work. What I really want to see is to remove projection in deep net, improve the online learning or analysis the robust generalization. However, both the results and the proof methods have little inspiration. This paper claims that '(Gao et al.) require the width of the net and the running time to be exponential in input dimension d, and they consider an activation function that is not used in practice' in abstract.
Review for NeurIPS paper: Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
This paper proves that adversarial training of over-parameterized neural networks converges to a robust solution. Specifically, the paper studies two-layer ReLU networks with width that is polynomial in the input dimension, d, the number of training points, n, and the inverse of the robustness parameter, 1/\epsilon. The proof is by construction; an algorithm is proposed that, in poly(d, n, 1/\epsilon) iterations, finds a network with poly(d, n, 1/\epsilon) width that is \epsilon-robust. Adversarial training is an important and rapidly expanding field of ML. This paper fills in some gaps w.r.t.
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. In practice adversarial training leads to low robust training loss. However, a rigorous explanation for why this happens under natural conditions is still missing. Recently a convergence theory of standard (non-adversarial) supervised training was developed by various groups for {\em very overparametrized} nets. It is unclear how to extend these results to adversarial training because of the min-max objective.